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(54) Unified messaging system with automatic language identification for text-to-speech 
conversion 



(57) A unified messaging system includes a voice 
gateway server coupled to an electronics mail system 
and a private branch exchange (PBX). The voice gate- 
way server provides voice messaging services to a set 
of subscribers. Within the voice gateway server, a tri- 
graph analyzer sequentially examines 3-character com- 
binations within a text message; determines occurrence 
frequencies for the character combinations; compares 
the occurrence frequencies with reference occurrence 



statistics modeled from text samples written in particu- 
lar languages; and generates a language identifier and 
a likelihood value for the text message. Based upon the 
language identifier, a message inquiry unit selects an 
appropriate text-to-speech engine for converting the 
text message into computer-generated speech that is 
played to a subscriber, 



(Z) (ZD 




FIG. 1 



Pnraed by Xerox (UK) Business Services 

2. 16.7/3.6- - ■ 



OOCID: <EP_0889628A2J_> 



1 



EP 0 889 628 A2 



2 



Description 

Cross-Reference to Related Publications 

This application is a regular U.S. Application filed 
from and claiming priority of provisional application 
serial number 60/051,720, filed on July 3, 1997, and 
entitled "Unified Messaging System With Automatic 
Language Identification for Text-to- Speech Conver- 
sion." This application relates to and incorporates by 
reference U.S. Patent Number 5.557,659. entitled "Elec- 
tronic Mail System Having Integrated Voice Messages." 

Field of the Invention 

The present invention relates to systems and meth- 
ods for voice and text messaging, as well as systems 
and method for language recognition. More particularly, 
the present invention is a communications system that 
automatically identifies a language associated with a 
text message, and performs an appropriate text-to- 
speech conversion. 

Background of the Invention 

Computer-based techniques for converting text into 
speech have become well-known in recent years. Via 
such techniques, textual data is translated to audio 
information by a text-to-speech conversion "engine," 
which most commonly comprises software. Examples 
of text-to-speech software include Apple Computer's 
Speech Manager (Apple Computer corporation, Cuper- 
tino, CA), and Digital Equipment Corporation's DECTalk 
(Digital Equipment Corporation, Cambridge, MA). In 
addition to converting textual data into speech, such 
software is responsive to user commands for controlling 
volume, pitch, rate, and other speech-related parame- 
ters. 

A text-to-speech engine generally comprises a text 
analyzer, a syntax and context analyzer, and a synthe- 
sis module. The text analyzer, in conjunction with the 
syntax and context analyzer, utilizes a rule-based index 
to identity fundamental grammatical units within textual 
data. The fundamental grammatical units are typically 
word and/or phoneme-based, and the rule-based index 
is correspondingly referred to as a phoneme library. 
Those skilled in the art will understand that the pho- 
neme library typically includes a word-based dictionary 
for the conversion of orthographic data into a phonemic 
representation. The synthesis module either assembles 
or generates speech sequences corresponding to the 
identified fundamental grammatical units, and plays the 
speech sequences to a listener. 

Text-to-speech conversion can be very useful within 
the context of unified or integrated messaging systems. 
In such messaging systems, a voice processing server 
is coupled to an electronic mail system, such that a 
user's e-mail in-box provides message notification as 



well as access to messaging services for e-mail mes- 
sages, voice messages, ad possibly other types of mes- 
sages such as faxes. An example of a unified 
messaging system is Octet's Unified Messenger (Octel 

5 Communications Corporation, Milpitas, CA). Such sys- 
tems selectively translate an e-mail message into 
speech through the use of text-to-speech conversion. A 
user calling from a remote telephone can therefore 
readily listen to both voice and e-mail messages. Thus. 

70 a unified messaging system employing text-to-speech 
conversion eliminates the need for a user to have direct 
access to their computer during message retrieval oper- 
ations. 

In many situations, messaging system users can 

15 expect to receive textual messages written in different 
languages. For example, a person conducting business 
in Europe might receive e-mail messages written in 
English, French, or German. To successfully convert 
text into speech within the context of a particular lan- 

20 guage requires a text-to-speech engine designed for 
that language. Thus, to successfully convert French text 
into spoken French requires a text-to-speech engine 
designed for the french language, including a French- 
specific phoneme library. Attempting to convert French 

25 text into spoken language through the use of an English 
text-to-speech engine would likely produce a large 
amount of unintelligible output 

In the prior art, messaging systems rely upon a 
human reader to specify a given text-to-speech engine 

30 to be used in converting a message into speech. Alter- 
natively, some systems enable a message originator to 
specify a language identification code that is sent with 
the message. Both approaches are inefficient and 
inconvenient. What is needed is a messaging system 

35 providing automatic written language identification as a 
prelude to text-to-speech conversion. 

Summary of the Invention 

40 The present invention is a unified messaging sys- 
tem providing automatic language identification for the 
conversion of textual messages into speech. The uni- 
fied messaging system comprises a voice gateway 
server coupled to a computer network and a Private 

45 Branch Exchange (PBX). The computer network 
includes a plurality of computers coupled to a file server, 
through which computer users identified in an electronic 
mail (e-mail) directory exchange messages. The voice 
gateway server facilitates the exchange of messages 

so between computer users and a telephone system, and 
additionally provides voice messsaging services to sub- 
scribers, each of whom is preferably a computer user 
identified in the e-mail directory. 

The voice gateway server preferably comprises a 

55 voice board, a network interface unit a processing unit, 
a data storage unit, and a memory wherein a set of 
voice messaging application units; a message buffer; a 
plurality of text-to-speech engines and corresponding 
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phoneme libraries; a trigraph analyzer; and a set of 
concurrence libraries reside Each voice messaging 
application unit comprises program instructions for pro- 
viding voice messaging functions such as call answer- 
ing, automated attendant, and message store/forward 4 5 
operations to voice messaging subscribers. 

A message inquiry unit directs message playback 
operations. In response to a subscriber's issuance of a 
voice message review request, the message inquiry unit 
plays the subscriber's voice messages in a conventional 10 
manner. In response to a text message review request, 
the message inquiry unit initiates automatic language 
identification operations, followed by a text-to-speech 
conversion performed in accordance with the results of 
the language identification operations. 15 

The trigraph analyzer examines a text sequence, 
and performs language identification operations by first 
det rmining the occurrence frequencies of sequential 3- 
character combinations within the text, and then com- 
paring the determined occurrence frequencies with ref- 20 

rence occurrence statistics for various languages. The 
set of reference occurrence statistics associated with a 
given language are stored together as a corecurrence 
library. The trigraph analyzer determines a closest 
match between the determined occurrence frequencies 25 
and a particular corecurrence library, and returns a cor- 
responding language identifier and likelihood value to 
the message inquiry unit. 

The message inquiry unit subsequently selects a 
text-to-speech engine and an associated phoneme 30 
Itxary. and initiates the conversion of the text message 
into computer-generated speech that is played to the 
subscriber in a conventional manner. 

Brief Description of the Drawings 35 

Figure 1 is a block diagram of a preferred embodi- 
ment of a unified messaging system constructed in 
accordance with the present invention. 
Figure 2 is a block diagram of a first and preferred 40 
embodiment of a voice server constructed in 
accordance with the present invention; 
Figure 3 is a flowchart of a first and preferred 
method for providing automatic language identifica- 
tion for text-to-speech conversion in the present 45 
invention; 

Figure 4 is a block diagram of a second embodi- 
ment of a voice server constructed in accordance 
with the present invention; and 

Figure 5 is a flowchart of a second method for pro- so 
viding automatic language identification for text-to- 
speech conversion in the present invention. 

Description of the Embodiments 

55 

Referring now to Figure 1 , a block diagram of a pre- 
ferred embodiment of a unified messaging system 1 00 
constructed in accordance with the present invention, is 



shown. The unified messaging system 100 comprises a 
set of telephones 110, 112, 114 coupled to a Private 
Branch Exchange (PBX) 120; a computer network 130 
comprising a plurality of computers 1 32 coupled to a file 
server 134 via a network fine 136, where the file server 
1 34 is additionally coupled to a data storage device 1 38; 
and a voice gateway server 140 that is coupled to the 
network line 136, and coupled to the PBX 120 via a set 
of telephone lines 142 as well as an integration link 144. 
The PBX 120 is further coupled to a telephone network 
via a collection of trunks 122, 124, 126. The unified 
messaging system 100 shown in Figure 1 is equivalent 
to the described in U.S. Patent No. 5,557,659, entitled 
"Electronic Mail System Having Integrated Voice Mes- 
sages," which is incorporated herein by reference. 
Those skilled in the art will recognize that the teachings 
of the present invention are applicable to essentially any 
unified or integrated messaging environment 

In the present invention, conventional software exe- 
cuting upon the computer network 130 provides file 
transfer services, group access to software applica- 
tions, as well as an electronic mail (e-mail) system 
through which computer user can transfer messages as 
well a message attachments between their computers 
132 via the file server 134. In an exemplary embodi- 
ment, Microsoft Exchange™ software (Microsoft Corpo- 
ration, Redmond, WA) executes upon the computer 
network 130 to provide such functionality. Within the file 
server 134, an e-mail directory associates each compu- 
ter user's name wfth a message storage location, or "in- 
box," and a network address, in a manner that will be 
readily understood by those skilled in the art The voice 
gateway server 140 facilitates the exchange of mes- 
sages between the computer network 130 and a tele- 
phone system. Additionally, the voice gateway server 
140 provides voice messaging service such as call 
answering, automated attendant, voice message store 
and forward, and message inquiry operations to voice 
messaging subscribers. In the preferred embodiment, 
each subscriber is a computer user identified in the e- 
mall directory, that is, having a computer 132 coupled to 
the network 130. Those skilled in the art will recognize 
that in an alternate embodiment, the voice messaging 
subscribers could be a subset of computer users. In yet 
another alternate embodiment, the computer users 
could be a subset of a larger pool of voice messaging 
subscribers, which might be useful when the voice gate- 
way server is primarily used for call answering. 

Referring also now to Figure 2, a block diagram of a 
first and preferred embodiment of a voice gateway 
server 140 constructed in accordance with the present 
invention is shown. In the preferred embodiment, the 
voice gateway server 140 comprises a voice board 200, 
a network interface unit 202, a processing unft 204, a 
data storage unit 206, and a memory 210 wherein a plu- 
rality of voice messaging application units 220, 222, 
224, 226; a message buffer 230; a set of text-to-speech 
engines 242, 243, 244 and corresponding phoneme 
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libraries 252, 253, 254; a trigraph analyzer 260; and a 
plurality of corecurrence libraries 272, 273, 274, 275, 
276 reside. Each element within the voice gateway 
server 140 is coupled to a common bus 299. The net- 
work interface unit 202 is additionally coupled to the net- 
work line 1 36, and the voice board 200 is coupled to the 
PBX 120. 

The voice board 200 preferably comprises conven- 
tional circuitry that interfaces a computer system with 
telephone switching equipment, and provides telephony 
and voice processing functions. The network interface 
unit 202 preferably comprises conventional rircuitry that 
manages data transfers between the voice gateway 
server 140 and the computer network 130. In the pre- 
ferred embodiment, the processing unit 204 and the 
data storage unit 206 are also conventional. 

The voice messaging application units 220, 222, 
224, 226 provide voice messaging services to subscrib- 
ers, including call answering, automated attendant and 
voice message store and forward operations. A mes- 
sage inquiry unit 226 directs telephone-based message 
playback operations in response to a subscriber 
request. In response to a voice message review 
request, the message inquiry unit 226 initiates the 
retrieval of a voice message associated with the sub- 
scriber's in-box, followed by the playing of the voice 
message to the user via the telephone in a conventional 
manner. In response to a text message review request 
the message inquiry unit 226 initiates retrieval of a text 
message associated with the subscriber's in-box, fol- 
lowed by automatic language recognition and text-to- 
speech conversion operations, as deserved in detail 
below with reference to Figure 3. In the preferred 
embodiment, each voice messaging application unit 
220, 222, 224, 226 comprises program instruction 
sequences that are executable by the processing unit 
204. 

The message buffer 230 comprises a portion of the 
memory 200 reserved for temporarily storing messages 
before or after message exchange with the file server 
134. The text-to-speech engines 242, 243, 244, 245, 
246 preferably comprise conventional software for 
translating textual data into speech. Those skilled in the 
art will readily understand that in an alternate embodi- 
ment, one or more portions of a text-to-speech engine 
242, 243. 244, 245, 246 could be implemented using 
hardware. 

The number of text-to-speech engines 242, 243, 
244 resident within the memory 210 at any given time is 
determined according to the language environment in 
which the present invention is employs In the preferred 
embodiment, the memory 210 includes a text-to-speech 
engine 242. 243, 244 for each language within a group 
of most-commonly expected languages. Additional text- 
to-speech engines 245. 246 preferably reside upon the 
data storage unit 206,' and are loaded into the memory 
210 when text-to-speech conversion for a language out- 
side the aforementioned group is required, as described 



in detail below. In an exemplary embodiment, text-to- 
speech engines 242. 243, 244 corresponding to Eng- 
lish, French, and German reside within the memory 
210, while text-to-speech engines 245, 246 for Portu- 

5 ' guese, Italian, and/or other languages reside upon the 
data storage unit 206. Those skilled in the art will recog- 
nize that in an alternate embodiment, the number of 
text-to-speech engines 242, 243, 244 resident within 
the memory could be determined according to a mem- 

10 ory management technique, such as virtual memory 
methods, where text-to-speech engines 242, 243. 244 
are conventionally swapped out to the data storage unit 
206 as required. 

The memory 210 preferably includes a conven- 

75 tional phoneme library 252, 253, 254 corresponding to 
each text-to-speech engine 242, 243. 244 residing 
therein. In the preferred embodiment a phoneme library 
255, 256 also resides upon the data storage unit 206 for 
each text-to-speech engine 245, 246 stored thereupon. 

20 The present invention preferably relies upon n- 
graph method for textual language identification, in par- 
ticular, techniques developed by Clive Souter and Gavin 
Churcher at the University of Leeds in the United King- 
dom, as reported in 1) "Bigram and Trigram Models for 

25 Language Identification and Classification," Proceed- 
ings of the AISB Workshop on Computational Linguis- 
tics for Speech and Handwriting Recognition. University 
of Leeds, 1994; 2) "Natural Language Identification 
Using Corpus-Based Models," Hermes Journal of Lin- 

30 guistics 13: 183-204, 1994; and 3) "N-gram Tools for 
Generic Symbol Processing," M. Sc. Thesis of Phil 
Cave, School of Computer Studies, University of Leeds, 
1995. 

In n-graph language identification, the occurrence 
35 frequencies of successive n-character combinations 
within a textual message are compared with reference 
n-character occurrence statistics associated with partic- 
ular languages. The reference statistics for any given 
language are automatically derived or modeled from 
40 text samples taken from that language. Herein, the ref- 
erence n-character occurrence statistics for a given lan- 
guage are stored together as a corecurrence library 

272, 273, 274, 275. 276. 

The present invention preferably employs the tri- 
45 graph analyzer 260 and corecurrence libraries 272, 

273, 274, 275, 276 to perform trigraph-based language 
identification, that is, language identification based, 
upon the statistical occurrences of three-letter combina- 
tions. In the preferred embodiment the memory 210 

so includes a corecurrence library 272, 273, 274, 275, 276 
corresponding to each text-to-speech engine 242, 243, 
244, within the memory 210 as well as each text-to- 
speech engine 245. 246 stored upon the data storage 
device 206. 

55 The trigraph analyzer 260 returns a language iden- 
tifier and a likelihood or percentage value that indicates 
relative language identification certainty. As developed 
at the University of Leeds, the trigraph analyzer 260 is 
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approximately 100% accurate when textual input com- 
prises at least 175 characters. The trigraph analyzer 
260 additionally maintains high language identification 
accuracy, typically greater than 90%, for shorter-length 
text sequences. 

In an exemplary embodiment, the voice gateway 
server 140 is a personal computer having a 200 MHz 
Intel Pentium™ Processor (Intel Corporation, Santa 
Clara, CA); 128 Megabytes of Random Access Memory 
(RAM); an Ethernet-based network interface unit 202; a 
Redundant Array of Inexpensive Disks (RAID) drive 
serving as the data storage unit 206; a Rhetorex voice 
board (Rhetorex Corporation, San Jose,, CA); DECTalk 
text-to-speech engines 242, 243, 244, 245, 246 and 
corresponding phoneme libraries 252, 253, 254, 255, 
256 (Digital Equipment Corporation, Cambridge. MA); 
the aforementioned trigraph analyzer 260 and associ- 
ated corecurrence libraries 272, 273, 274, 275, 276 
developed at the University of Leeds; and voice mes- 
saging application units 220, 222, 224, 226 imple- 
mented using Octel's Unified Messenger software 
(Octel Communications Corporation, Miipitas, CA). 

Referring now to Figure 3, a flowchart of a first and 
preferred method for providing automatic language 
identification for text-to-speech conversion is shown. 
The preferred method begins in step 300 in response to 
a subscriber's issuance of a text message review 
request, with the message inquiry unit 226 retrieving a 
text message from the subscriber's in-box, or from a 
particular data file or folder as specified by the sub- 
scriber. In the preferred embodiment, the subscriber's 
in-box corresponds to a file server storage location, and 
the retrieved text message is transferred to the mes- 
sage buffer 230. Following step 300, the message 
inquiry unit 226 issues an identification directive to the 
trigraph analyzer 260 in step 302, thereby initiating lan- 
guage identification. 

In response to the identification directive, the tri- 
graph analyzer 260 examines successive 3-character 
combinations within the text message currently under 
consideration, and determines occurrence frequencies 
for the character combinations in step 304. In the pre- 
ferred embodiment, the trigraph analyzer 260 examines 
the first 1 75 characters of the text message in the event 
that the text message is sufficiently long; otherwise, the 
trigraph analyzer 260 examines the longest character 
sequence possible. 

Following the determination of the occurrence fre- 
quencies for the current text message, the trigraph ana- 
lyzer 260 compares the occurrence frequencies with the 
reference occurrence statistics in each corecurrence 
library 272, 273. 274, 275, 276 and determines a clos- 
est match with a particular corecurrence library 272, 
273. 274. 275 in step 308. Upon determining the closest 
match, the trigraph analyzer 260 returns a language 
identifier and an associated likelihood value to the mes- 
sage inquiry unit 226 in step 310. Those skilled in the 
art will recognize that the trigraph analyzer 260 could 



return a set of language identifiers and a likelihood 
value corresponding to each language identifier in an 
alternate embodiment. 

As long as the text message is written in a language 

5 corresponding to one of the corecurrence libraries 272, 
273, 274, 275, 276, the correlation between the occur- 
rence frequencies and the reference occurrence statis- 
tics is likely to be sufficient for successful language 
identification, rf the text message is written in a lan- 

10 guage that does not correspond to any of the corecur- 
rence libraries 272, 273, 274, 275, 276 present the 
correlation will be poor, and a closest match cannot be 
determined. In the event that the likelihood value 
returned by trigraph analyzer 260 is below a minimum 

75 acceptable threshold (for example, 20%), the message 
inquiry unit 226 plays a corresponding prerecorded 
message to the subscriber via steps 312 and 318. An 
exemplary prerecorded message could be "language 
identification unsuccessful." 

20 Upon receiving the language identifier and an 
acceptable likelihood value, the message inquiry unit 
226 selects the appropriate text-to-speech engine 242, 
243, 244, 245. 246 in step 314. In the event that the text- 
to-speech engine 244, 245 and its associated phoneme 

25 library 254, 255 do not presently reside within the mem- 
ory 210, the message inquiry unit 226 transfers the 
required text-to-speech engine 244, 245 and the corre- 
sponding phoneme library 254, 255 from the data stor- 
age unit 206 into the memory 210. 

30 After step 314, the message inquiry unit 226 issues 
a conversion directive to the selected text-to-speech 
engine 242, 243, 244, 245. 246 in step 316. following 
which the text message currently under consideration is 
converted to speech and played to the subscriber in a 

35 conventional manner. Upon completion of step 316, the 
message inquiry unit 226 determines whether another 
text message in the subscriber's in-box, or as specified 
by the subscriber, requires consideration in step 320. If 
so, the preferred method proceeds to step 300; other- 

40 wise, the preferred method ends. 

In an alternate embodiment, steps 312 and 318 
could be omitted, such that step 310 directly proceeds 
to step 314 to produce a "best guess" text-to-speech 
conversion played to the subscrber. In such a alternate 

45 embodiment the message inquiry unit 226 could 1) dis- 
regard the likelihood value; or 2) select the language 
identifier associated with a best likelihood value in the 
event that multiple language identifiers and likelihood 
values are returned. 

so In the preferred embodiment, textual language 
identification is performed, followed by text-to-speech 
conversion in the appropriate language. This results in 
the subscriber listening to computer-generated speech 
that matches the language in which the original text 

55 message was written. In an alternate embodiment, tex- 
tual language identification could be performed, fol- 
lowed by text-to-text language conversion (i.e., 
translation), followed by text-to-speech conversion such 
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that the subscriber listens to computer generated 
speech in a language with which the subscriber is most 
comfortable. To facilitate this alternate embodiment, a 
set of subscriber language preference selections are 
stored as user-configuration data within a subscriber 
information database or directory. The subscriber infor- 
mation database could reside within the voice gateway 
server_J40, or it could be implemented in association 
with the file server s e-mail directory in a manner those 
skilled in the art will readily understand. Additionally, the 
voice gateway server 140 is modified to include addi- 
tional elements, as described in detail hereafter. 

Referring now to Figure 4, a block diagram of a sec- 
ond embodiment of a voice gateway server 141 con- 
structed in accordance with the present invention is 
shown. Elements common to both Figures 2 and 4 are 
number alike for ease of understanding. In addition to 
having the elements shown in Figure 2, the second 
mbodiment of the voice gateway server 141 includes a 
set of conventional text translators 282, 283, 284, 285, 
286, each having an associated word dictionary 292, 

293. 294, 295. 296. Those skilled in the art will under- 
stand that the word dictionaries 292, 293, 294, 295, 296 
are distinct from (i.e., not equivalent to) the phoneme 
Itoraries 252. 253. 254, 255. 256 in content and manner 
of use. and that each text translator 282, 283. 284, 285. 
286 corresponds to a particular target language availa- 
ble for subscriber selection. Text translators 282, 283, 
284 and word dictionaries 292, 293, 294 corresponding 
to rrost-common subscrfoer preference selections 
reside wtthin the memory 210, while those for less-fre- 
quemty selected languages reside upon the data stor- 
age device 206, to be transferred into the memory 210 
as required. Those skilled in the art will also understand 
that m an alternate embodiment, the text translators 
282. 283, 284, 285, 286 and corresponding word dic- 
tionaries 292, 293, 294, 295, 296 could normally reside 
upon the data storage device 206, to be swapped into or 
out of the memory 210 as required during system oper- 
ation. In an exemplary embodiment, the text translators 
282. 283. 284, 285, 286 and word dictionaries 292, 293, 

294, 295. 296 could be implemented using commer- 
cially-available software such as that provided by Trans- 
lation Experts, Ltd. of London, England; or Language 
Partners International of Evanston, IL 

R ferring now to Figure 5, a flowchart of a second 
method for providing automatic language identification 
tor text-to-speech conversion is shown. The second 
method begins in step 500 in response to a subscriber's 
issuance of a text message review request, with the 
message inquiry unit 226 retrieving the subscriber's lan- 
guage preference settings. Next in step 501. the mes- 
sage inquiry unit retrieves a text message from the 
subscriber's in-box or from a data file or data folder as 
specified by the subscriber, and stores or copies the 
retrieved message into the message buffer 230. Follow- 
ing step 501. the message inquiry unit 226 issues an 
identification directive to the trigraph analyzer 260 in 



step 502, thereby initiating language identification. Lan- 
guage identification is preferably performed in steps 504 
through 512 in an analogous manner to that described 
above in steps 304 through 312 of Figure 3. Successful 

' 5 language identification results when the trigraph ana- 
lyzer 260 returns a language identifier and a likelihood 
value greater than a minimum threshold value to the 
message inquiry unit 226. 

Upon receiving a language identifier and an accept- 
to able likelihood value, the message inquiry unit 226 
selects the appropriate text translator 282. 283, 284, 
285, 286 and associated word dictionary 292, 293, 294, 
295, 296 and issues a translation directive in step 514, 
thereby performing the translation of the current text 

75 message into the target language given by the sub- 
scriber's language preference setting. Next, in step 516. 
the message inquiry unit 226 issues a conversion direc- 
tive to the text-to-speech engine 242, 243, 244, 245, 
246 that corresponds to the subscriber's language pref- 

20 erence settings, causing the conversion of the trans- 
lated text message to speech. The speech is preferably 
played to the subscriber in a conventional manner. 
Upon completion of step 516, the message inquiry unit 
226 determines whether another text message in the 

25 subscriber's in-box or as specified by the subscriber 
requires consideration in step 520. If so, the preferred 
method proceeds to step 501 ; otherwise, the preferred 
method ends. 

Those skilled in the art will recognize that in the 

30 alternate embodiment, each word dictionary 292, 293, 
294, 295. 296 should include words that may be partic- 
ular to a give work environment in which the present 
invention may be employed. For example, use of the 
alternate embodiment in a computer-related business 

35 setting would necessitate word dictionaries 292, 293, 
294, 295, 296 that include computer-related terms to 
ensure proper translation. In general, the first and pre- 
ferred embodiment of the present invention is more 
robust and flexible than the second embocfiment 

40 because direct conversion of text into speech, without 
intermediate text-tb-text translation, is not constrained 
by the limitations of a word dictionary and is less sus- 
ceptfole to problems arising from word spelling varia- 
tions. 

45 From above it can be seen that the present inven- 
tion is related to a unified messaging system and 
includes a voice gateway server coupled to a electronic 
mail system and a private branch exchange (PBX). The 
voice gateway server provides voice messaging serv- 

50 ices to a set of subscribers. Within the voice gateway 
server, a tri-graph analyzer sequentially examines 3 
character combinations; compares the occurrence fre- 
quencies with reference occurrence statistics modeled 
from text samples written in particular languages; and 

55 generates a language identifier; and a likelihood value 
for the text message. Based upon the language identi- 
fier, a message inquiry unit selects an appropriate text- 
to-speech engine for converting the text message into 
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computer-generated speech that is played to a sub- 
scriber. 

While the present invention has been described 
with reference to certain preferred embodiments, those 
skilled in the art will recognize that various modifications 
can be provided. For example, a language identification 
tool based upon techniques other than n-graph meth- 
ods could be utilized instead of the trigraph analyzer 
260 and associated concurrence libraries 272, 273, 
274, 275. 276. As another example, one or more text-to- 
speech engines 242, 243, 244, 245, 246 could be imple- 
mented via hardware, such as through "off-board" text- 
to-speech engines accessed through the use of remote 
procedure calls. As yet another example, converted 
speech data or translated text data could be stored for 
future use, which could be useful in a store-once, multi- 
ple-playback environment. The description herein pro- 
vides for these and other variations upon the present 
invention, which is limited only by the following claims. 

Claims 

1 . A method of operating language-based conversion 
of a present text message into speech, the method 
comprising the following steps: 

a. retrieving the present text message; 

b. automatically generating a language identi- 
fier corresponding to the present text message; 

c. converting the present text message directly 
into computer-generated speech in response 
to the language identifier; and 

d. playing the computer generated speech to a 
subscriber. 

2. The method as claimed in claim 1 , wherein the step 
of converting is performed by a text-to-speech 
engine utilizing a phoneme library. 

3. The method as claimed in claim 1 , further compris- 
ing the following steps: 

a. sensing a subsequent text message; and 

b. repeating the steps of retrieving, generating, 
converting, and playing in response to the step 
of sensing. 

4. The method as claimed in claim 1 , wherein the step 
of automatically generating further comprising the 
following steps: 

a. examining a sequence of characters of the 
present text message; 

b. forming an occurrence frequency of the 
present text message based upon the 
sequence of characters of the present text 
message; and 

c. matching the occurrence frequency with one 



12 

of a plurality of corecunence libraries. 

5. The method as claimed in claim 4, wherein the step 
of matching further comprising the steps of: 

5 

a. comparing the occurrence frequency with 
each of a plurality of reference frequencies 
wherein each of the plurality of reference fre- 
quencies corresponds to one of the plurality of 

io corecurrence libraries; and 

b. determining a best match between the 
occurrence frequency and one of the plurality 
of reference frequencies. 

75 6. The method as claimed in claim 4. wherein the step 
of examining comprises a trigraph analyzer for 
inspecting a combination of three consecutive char- 
acters within the sequence of characters. 

20 7. The method as claimed in claim 4, wherein the 
sequence of characters is found in a first portion of 
the present text message. 

8. The method as claimed in claim 4, wherein the step 
25 of matching further comprising the following steps: 

a. comparing the occurrence frequency with 
each of a plurality of reference frequencies 
wherein each of the plurality of reference fre- 

30 quencies corresponds to one of the plurality of 

corecurrence libraries; and 

b. determining that a sufficient number of 
matches exist between the occurrence fre- 
quency and one of the plurality of reference fre- 

35 quencies. 

9. The method as claimed in claim 8, wherein the step 
of matching is performed when there is the suffi- 
cient number of matches between the occurence 

40 frequency and one of the plurality of reference fre- 
quencies. 

10. The method as claimed in daim 8, further compris- 
ing the step of terminating the method when the 

45 sufficient number of matches does not exist. 

11. A method of providing language-based conversion 
of an original text message into speech for a user 
comprising the following steps: 

50 

a. retrieving the original text message: 

b. automatically generating a language identi- 
fier corresponding to the original text message; 
and 

55 c. translating the original text message into a 

translated text message in a user selected lan- 
guage based upon the language identifier. 
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12. The method as claimed in claim 11, further com- 
prising the following steps: 

a. converting the translated text message into 
computer generated speech based upon the 
user selected language; and 

b. playing the computer generated speech to 
the user. 

13. The method as claimed in claim 11, further com- 
prising the following step: polling the user for the 
user selected language. 

14. The method as claimed in claim 11, wherein the 
step of automatically generating further comprising 
the following steps: 

a. examining a sequence of characters of the 
original text message; 

b. forming an occurrence frequency of the orig- 
inal text message from the sequence of char- 
acters of the original text message; and 

c. matching the occurrence frequency with one 
of a plurality of corecurrence libraries. 

15. The method as claimed in claim 14, wherein the 
step of matching further comprising the following 

steps: 

a. comparing the occurrence frequency with 
each of a plurality of reference frequencies 
wherein each of the plurality of reference fre- 
quencies corresponds to one of the plurality of 
corecurrence libraries; and 

b. determining that there is a sufficient number 
of matches between the occurrence frequency 
and one of the plurality of reference frequen- 
cies. 

16. The method as claimed in claim 14, wherein the 
step of matching further comprising the following 

steps: 

a. comparing the occurrence frequency with 
each of a plurality of reference frequencies 
wherein each of the plurality of reference fre- 
quencies corresponds to one of the plurality of 
corecurrence libraries; and 

b. determining a best match between the 
occurrence frequency and the plurality of refer- 
ence frequencies. 

17. A messaging system for converting a text message 
into computer generated speech, the system com- 
prising: 

a. means for storing the text message; 
. b. means for automatically generating a lan- 
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guage identifier corresponding to the text mes- 
sage wherein the means for automatically 
generating is coupled to the means for storing; 
and 

5 c. a text to speech engine coupled to the 

means for storing wherein the text to speech 
engine converts the text message into the com- 
puter generated speech based upon the lan- 
guage identifier. 

10 

18. The system as claimed in claim 17 wherein the 
means for automatically generating further com- 
prising means for comparing an occurrence fre- 
quency of the text message with a plurality of 

is reference frequencies wherein each reference fre- 
quency corresponds to a particular corecurrence 
library. 

19. The system as claimed in claim 17 further com- 
20 prises a phoneme library coupled to the text to 

speech engine for converting the text message into 
the computer generated speech. 

20. The system as claimed in claim 17 wherein the 
25 means for automatically generating further com- 
prises a trigraph analyzer to formulate an occur- 
rence frequency of the text message based on 
examining a combination of three consecutive char- 
acters within the text message. 

30 

21 . A voice messaging system for providing voice mes- 
saging services to a set of subscribers, the voice 
messaging system comprising: 

35 a. means for retrieving a text message; 

b. means for automatically generating a lan- 
guage identifier corresponding to the text mes- 
sage; 

c. means for converting the text message into 
40 computer-generated speech based upon the 

language identifier; and 

d. means for playing the computer-generated 
speech to a subscriber. 

45 22. The voice messaging system according to daim 21 
further comprising a voice gateway server config- 
ured to be connected to a computer network and a 
Private Branch Exchange, wherein the voice gate- 
way server facilitates. 

50 

23. A method of operating language-based conversion 
of a present text message into speech, the method 
comprising the following steps: 

55 a. retrieving the present text message; 

b. automatically generating a language identi- 
fier corresponding to the present text message; 

c. selecting an appropriate text-to-speech 
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engine in response to the language identifier; 

d. converting the present text message directly 
into computer-generated speech in response 
to the appropriate text-to-speech engine; and 

e. playing the computer generated speech to a s 
subscriber. 

24. In a voice messaging system providing voice mes- 
saging services to a set of subscribers, a method 
for language-based conversion of a text message 10 
into speech comprising the steps of: 

retrieving a text message; 
automatically generating a language identifier 
corresponding to the text message; is 
converting the text message into computer- 
generated speech based upon the language 
identifier; and 

playing the computer-generated speech to a 
subscriber. 20 
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